> MCP β€” The Mega Context Problem

Budding
planted May 2, 2026tended May 6, 2026
#research#ai#mcp#agents#claude-code#sandboxing#cloudflare#api-design

MCP β€” The Mega Context Problem

A synthesis of Matt Carey's talk on Cloudflare's MCP work. The talk is the most direct naming I've seen of an emerging anti-pattern β€” and the most concrete fix.

Source: AI Engineer talk, "MCP = Mega Context Problem", Matt Carey (Cloudflare). YouTube.

The problem, named

  • Cloudflare's OpenAPI spec is 2.3M tokens. Naively converted to MCP tool definitions, it's still ~1.1M tokens β€” too big for any agent context.
  • The industry's hack (Cloudflare did it too): split one company into many small MCP servers. Cloudflare had 16 servers, ~2,600 endpoints in total. The user has to pick which server to use; coverage is incomplete by design β€” a product suite might publish 6 tools when its real API has 30 endpoints.
  • This is not an MCP problem β€” it's an agent problem. MCP is a protocol. The bug is "dumping loads of tools into context." The same protocol can serve progressive discovery if clients implement it.

Three approaches to progressive discovery

| Approach | Pattern | Catch | |----------|---------|-------| | CLIs | Self-discoverable via --help. Used by Claude Code, OpenClaude. | Requires shell access. | | Tool search | Keyword-match the user query β†’ load Kβ‰ˆ8 tools into context on demand. Claude Code's pattern. | Most loaded tools won't be used (~500 / 2,100 tokens active). | | Code mode | Generate a typed SDK from the OpenAPI spec. Expose one tool, run_code. Agent writes TypeScript against typed signatures; the server executes that code in a sandbox. | Untrusted code execution. (Solved by V8 isolates β€” see below.) |

Why code mode took a while to catch on β€” and what unblocked it

"Running untrusted code is mega mega scary." β€” Carey

Untrusted code can read the filesystem, exfiltrate secrets, infinite-loop, run a crypto miner. Historical workarounds β€” DSLs, heavy VMs, code review β€” were too expensive to justify the win.

The new primitive that fixes it: V8 isolates. Cloudflare Workers, Deno, Pydantic Monty. Lightweight, fast-spin, full sandboxing. With programmable guardrails: flip a boolean to allow / deny network access; pass a function to allow only specific domains.

Live-demo proof from the talk: a generated worker tries process.env β†’ no secrets leak. Tries fetch() β†’ "this worker is not permitted to access the internet." Flip the flag β†’ it can.

Why one run_code tool beats hundreds of function tools

  • Degrees of freedom. A TypeScript snippet calling 5 endpoints in a loop with conditional branching is dozens of tokens. The same as 5 sequential tool calls would burn through orchestration latency and intermediate results.
  • Code is a compact plan. It encodes branching, looping, error handling β€” none of which a flat tool surface gives you.
  • TypeScript types are the unlock. Types are an extremely concise format for representing inputs and outputs in a way the model can reason about. Generated from your existing OpenAPI spec, they become the agent's API contract.
  • Models keep getting better at code. Code-mode systems benefit from every model release for free; tool-call systems don't compound nearly as well.

What Cloudflare actually shipped

  • Code Mode blog post (summer 2025): "How we gave agents an entire API in 1,000 tokens."
  • The Cloudflare MCP server is built on this β€” read-only access to all 2,000+ Cloudflare API endpoints from a single MCP connection.
  • WorkerD β€” Cloudflare's open-source reference implementation of the dynamic-worker primitive. Same V8-isolate technology Cloudflare runs at scale, available locally.

Equivalents in other ecosystems:

  • Deno β€” deno run --allow-net=domain.com style permission flags.
  • Pydantic Monty β€” Python code interpreter; heavier (Python boot cost) but viable for Python ecosystems.

Where this is going (Carey's predictions)

Server side

  • More sandbox primitives shipping. WorkerD, Deno, Monty are first wave; this becomes a standard infra primitive.
  • APIs must be ready to "take a beating." Agents in for-loops across sandboxes will hammer endpoints. Rate limiting becomes load-bearing, not optional.

Client side (where Carey thinks the real innovation will happen)

  • Programmatic tool calling lands in clients. Local clients can eval() directly; cloud clients use the same sandbox primitives as servers. Either way, "run untrusted code" becomes the default tool call.
  • Saved mini-scripts. A user does an action via agent-generated code; the client offers to save that code as a reusable mini-script. Set it on a cron. When the script breaks (e.g., a scraping target changes), the agent self-heals it.
  • More MCP clients, because the TypeScript SDK is being rewritten lightweight enough to fit any bundle. Carey's team is doing this work directly. Clients will get more diverse UI/UX.
  • MCP as middleware. By end of 2026, every major TypeScript framework will ship MCP=true as a single flag on existing API services. The SDK will be small enough to bundle natively. Your existing 1,000 Next.js endpoints get exposed over MCP automatically β€” and code mode means the client can use all of them without context explosion.

Why MCP clients are hard today

  • Stateful connections (annoying).
  • Resumability between connections (very annoying).
  • Generally a pain to maintain.
  • Result: most clients are stripped down, offload everything to the bare-bones SDK, and there's been little innovation in the UI/UX layer.

The fix in flight: stateless transport (in the June 2026 spec) + a lightweight SDK = much lower bar to building good clients.

Where we were vs where we're going

| Era | Pattern | |-----|---------| | Pre-MCP | Each agent bundles its own Gmail tools, Slack tools, etc. Re-built per agent. | | Early MCP | Service providers ship one MCP server per product. ~8 tools each, manageable. | | Mid-2025 | Companies split into 16 servers because 2,000 tools breaks context. Incomplete coverage. | | 2026 (now) | Progressive discovery via tool_search OR code mode. CLIs popular but need shell. | | Late 2026 (predicted) | MCP=true flag in TS frameworks; one-tool code-mode by default; saved mini-scripts; many more MCP clients. |

My takeaways

  1. Stop publishing one MCP server per product. If you're a platform with a real API, the right path is one MCP server + code mode, not N small servers with cherry-picked endpoints.
  2. Generate a typed SDK from your OpenAPI spec. This is the unlock. TypeScript types are the agent's API contract. Same source-of-truth as your existing API docs.
  3. Pick a sandbox primitive. Cloudflare WorkerD, Deno (with permission flags), or Pydantic Monty β€” choose based on language and infra. Have programmable guardrails (network access, domain allowlist, secret access) as defaults.
  4. Add aggressive rate limiting now. Agents in for-loops across sandboxes will hammer your API. Rate limiting was nice-to-have; in 2026 it's load-bearing.
  5. For clients you build: implement programmatic tool calling and save-mini-scripts. The user-facing innovation is here.
  6. Watch for the MCP=true flag. When TypeScript frameworks add native MCP middleware (Carey predicts late 2026), retire your bespoke MCP servers in favor of the framework integration.

Notable quotes

"MCP's a protocol. All of these can be exposed over MCP. We just shouldn't be dumping loads of tools into context. That's, like, the main thing."

"We were trying to give access to the whole of the Cloudflare API to agents. You try and make naive tools that have every single API endpoint and you fully explode a context window."

"Code is actually a very compact plan. Instead of doing tool calls, you can have one tool called code where the model generates the code of your choice and then you run it. And that code has so many more degrees of freedom than an individual tool call."

"Your APIs have to be ready to take a beating, because they have to have good rate limiting. I can run this in a for-loop on multiple sandboxes at once and just hammer your API."

"I think we're going to see MCP as a middleware. When you build an API service, it will be a flag that you can flag on in your favorite framework β€” MCP=true on all of your APIs."

"The 1950s, when you wanted to run something on a computer in your local town, you printed out some punch cards and you stamped them and you gave them to the guy. That was kind of like running untrusted code. Now we're going much more back to that β€” your users can write code, because your users are AI. And AI is very good at writing code."

Connection points

  • Carey's "one run_code tool beats hundreds of function tools" maps almost exactly onto the practice I described in How I Run Claude Code β€” prefer a skill that wraps curl over a heavy MCP server when both can do the job. The Slack-via-curl skill is essentially code mode at the client side: my agent writes the curl, the harness runs it. Saves context, easier to debug, fewer tokens.
  • Pairs with Agent Harness Engineering β€” Synthesis β€” Carey is talking about the connectivity layer; the harness work is about making the codebase legible to agents in the first place. Both converge on "let the model write code against typed contracts" as the load-bearing primitive.
  • The "saved mini-scripts" prediction is the same shape as my claude-autoresearch plugin and agent-orchestrator daemon β€” durable agent runs that survive context resets, only at a smaller per-call granularity.

Additional resources

  • Cloudflare Code Mode blog post β€” "How we gave agents an entire API in 1,000 tokens"
  • Cloudflare MCP server β€” read-only access to all Cloudflare APIs, the reference implementation
  • WorkerD β€” open-source dynamic-worker runtime, V8-isolate sandbox primitive
  • Deno β€” alternative sandbox primitive with permission flags
  • The MCP TypeScript SDK β€” actively being rewritten lightweight